Efficient Gaussian mixture model evaluation in voice conversion
نویسندگان
چکیده
Voice conversion refers to the adaptation of the characteristics of a source speaker's voice to those of a target speaker. Gaussian mixture models (GMM) have been found to be efficient in the voice conversion task. The GMM parameters are estimated from a training set with the goal to minimize the mean squared error (MSE) between the transformed and target vectors. Obviously, the quality of the GMM model plays an important role in achieving better voice conversion quality. This paper presents a very efficient approach for the evaluation of GMM models directly from the model parameters without using any test data, facilitating the improvement of the transformation performance especially in the case of embedded implementations. Though the proposed approach can be used in any application that utilizes GMM based transformation, we take voice conversion as an example application throughout the paper. The proposed approach is experimented with in this context and evaluated against an MSE based evaluation method. The results show that the proposed method is in line with all subjective observations and MSE results.
منابع مشابه
Straight-based voice conversion algorithm based on Gaussian mixture model
The voice conversion algorithm based on the Gaussian mixture model (GMM) has also been proposed by Stylianou et al. In this algorithm, the acoustic space of a speaker is represented continuously. In this paper, we apply this GMMbased voice conversion algorithm to STRAIGHT proposed by Kawahara et al., which is recognized as a high quality vocoder. In order to evaluate this voice conversion algor...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملMaximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
The performance of voice conversion has been considerably improved through statistical modeling of spectral sequences. However, the converted speech still contains traces of artificial sounds. To alleviate this, it is necessary to statistically model a source sequence as well as a spectral sequence. In this paper, we introduce STRAIGHT mixed excitation to a framework of the voice conversion bas...
متن کاملA system for voice conversion based on probabilistic classification and a harmonic plus noise model
Voice conversion is defined as modifying the speech signal of one speaker (source speaker) so that it sounds as if it had been pronounced by a different speaker (target speaker). This paper describes a system for efficient voice conversion. A novel mapping function is presented which associates the acoustic space of the source speaker with the acoustic space of the target speaker. The proposed ...
متن کاملImplementation of Computationally Efficient Real-Time Voice Conversion
This paper presents an implementation of real-time processing of statistical voice conversion (VC) based on Gaussian mixture models (GMMs). To develop VC applications for enhancing our human-to-human speech communication, it is essential to implement real-time conversion processing. Moreover, it is useful to reduce computational complexity of the conversion processing for making VC applications...
متن کامل